Assisted keyword indexing for lecture videos using unsupervised keyword spotting

نویسندگان

  • Manish Kanadje
  • Zachary Miller
  • Anurag Agarwal
  • Roger Gaborski
  • Richard Zanibbi
  • Stephanie Ludi
چکیده

Many students use videos to supplement learning outside the classroom. This is particularly important for students with challenged visual capacities, for whom seeing the board during lecture is di cult. For these students, we believe that recording the lectures they attend and providing e↵ective video indexing and search tools will make it easier for them to learn course subject matter at their own pace. As a first step in this direction, we seek to help instructors create an index for their lecture videos using audio keyword search, with queries recorded by the instructor on their laptop and/or created from video excerpts. For this we have created an unsupervised within-speaker keyword spotting system. We represent audio data using de-noised, whitened and scale-normalized Mel Frequency Cepstral Coe cient (MFCC) features, and locate queries using Segmental Dynamic Time Warping (SDTW) of feature sequences. Our system is evaluated using introductory Linear Algebra lectures from instructors with di↵erent accents at two U.S. universities. For lectures produced using a video camera at RIT, laptop-recorded queries obtain an average Precision at 10 of 71.5%, while 79.5% is obtained for within-lecture queries. For lectures recorded using a lapel microphone at MIT, using a similar keyword set we obtain a much higher average Precision at 10 of 89.5%. Our results suggest that our system is robust to changes in environment, speaker and recording setup. c 2015 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Spoken Keyword Spotting and Learning of Acoustically Meaningful Units

The problem of keyword spotting in audio data has been explored for many years. Typically researchers use supervised methods to train statistical models to detect keyword instances. However, such supervised methods require large quantities of annotated data that is unlikely to be available for the majority of languages in the world. This thesis addresses this lack-of-annotation problem and pres...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Keyword spotting enhancement for video soundtrack indexing

Multimedia databases contain an increasing amount of videos that are hardly semantically accessed. Among the useful indices that can be extracted from the sound track, the presence of a keyword at some place plays a prominent role. This paper deals with the specificities of such a keyword spotter and the enhancement brought to our previous technique, [1] based on frame labeling. To be useful, s...

متن کامل

Keyword Spotting Enhancement for Video Sountrack Indexing

Multimedia databases contain an increasing amount of videos that are hardly semantically accessed. Among the useful indices that can be extracted from the sound track, the presence of a keyword at some place plays a prominent role. This paper deals with the specificities of such a keyword spotter and the enhancement brought to our previous technique, [1] based on frame labeling. To be useful, s...

متن کامل

Learning Speech-Based Video Concept Models Using WordNet

Modeling concepts using supervised or unsupervised machine learning approaches are becoming more and more important for video semantic indexing, retrieval and filtering applications. Naturally, videos include multimodality audio, speech, visual and text data, that are combined to inferred therein the overall semantic concepts. However, in literature, most researches were mostly conducted within...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 71  شماره 

صفحات  -

تاریخ انتشار 2016